This tutorial is meant to be a step-by-step guide for likelihood-based assignment methods, like those used in Hobson et al. 2009, van Wilgenburg et al. 2011, Vander Zanden et al. 2014, and Kusack et al. 2022. Using this code, you will be able to probabilistically assign individuals to origin by comparing stable-isotopes values of tissues to established spatial surfaces for stable isotope values within precipitation.

We will mainly use functions within the AssignR package (see Ma et al. 2020), but mention some functions within the isocat package (see Campbell et al. 2020. Our approach is similar to the overall workflow, but with a few changes.

For this tutorial we provide all of the necessary data. All other data will be generated within R, including the isotope data for the assigned individuals.


1 - Load packages

The following packages are required to run this Rmd script. They will automatically install and load if you do not currently have them installed.

if (!require('assignR')) install.packages('assignR'); library('assignR')
if (!require('raster')) install.packages('raster'); library('raster')
if (!require('terra')) install.packages('terra'); library('terra')
if (!require('sf')) install.packages('sf'); library('sf')
if (!require('sp')) install.packages('sp'); library('sp')
if (!require('rasterVis')) install.packages('rasterVis'); library('rasterVis')
if (!require('rnaturalearth')) install.packages('rnaturalearth'); library('rnaturalearth')
if (!require('dplyr')) install.packages('dplyr'); library('dplyr')
if (!require('RColorBrewer')) install.packages('RColorBrewer'); library('RColorBrewer')
if (!require('ggplot2')) install.packages('ggplot2'); library('ggplot2')
if (!require('plotly')) install.packages('plotly'); library('plotly')
if (!require('rdryad')) install.packages('rdryad'); library('rdryad')

custom.palette <- colorRampPalette(brewer.pal(9, "YlGnBu")) # Color palette

2 - Isotope data

For this tutorial, we will use data collected during my PhD thesis on American Black Ducks (see link for species description). These data were collected from hunters, via the parts collection and species composition surveys, from across their range in eastern North America during hunting season (Sep-Jan). From these ducks, we have primary feather stable-hydrogen values (\(\delta\)2Hf; relative to the Vienna Standard Mean Ocean Water, VSMOW). These data are published and publicly available here, but we won’t repeat the analysis from that paper.

Based on timing of capture and knowing the moult schedule for black ducks, we know that these feathers were grown on the breeding grounds or moulting sites and we can use these feathers to assign individuals to breeding/moult origin. For your own data, it is important to understand moult timing and life history of your species to know exactly where the tissues were grown.

duck.d2h <- read.csv(rdryad::dryad_files_download(1697805)[[1]]) # Download data directly from Dryad

glimpse(duck.d2h) # Check data structure
Rows: 664
Columns: 23
$ id.lab       <chr> "UWO10300", "UWO10301", "UWO10302", "UWO10303", "UWO10304…
$ id.full      <chr> "A192066", "M274723", "M274758", "M286504", "M317005", "M…
$ VSMOW        <dbl> -126.0, -125.9, -110.5, -117.1, -113.2, -103.6, -97.2, -1…
$ county.id    <chr> "MI Tuscola", "MI Iosco", "MI Iosco", "WI Green Lake", "M…
$ species      <chr> "ABDU", "ABDU", "ABDU", "ABDU", "ABDU", "ABDU", "ABDU", "…
$ country      <chr> "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "…
$ season       <chr> "2017-2018", "2017-2018", "2017-2018", "2017-2018", "2017…
$ state.prov   <chr> "MI", "MI", "MI", "WI", "MI", "MI", "WI", "IL", "IL", "OH…
$ county       <chr> "Tuscola", "Iosco", "Iosco", "Green Lake", "Benzie", "Ion…
$ lat.dec      <dbl> 43.46972, 44.36015, 44.36015, 43.80162, 44.63980, 42.9500…
$ long.dec     <dbl> -83.41717, -83.64132, -83.64132, -89.04051, -86.01980, -8…
$ age          <chr> "I", "I", "I", "I", "I", "I", "I", "I", "I", "I", "I", "I…
$ sex          <chr> "F", "M", "M", "F", "M", "F", "M", "F", "F", "M", "F", "F…
$ mm           <int> 11, 10, 11, 10, 10, 10, 11, 12, 11, 12, 12, 11, 10, 12, 1…
$ dd           <int> 8, 11, 2, 26, 10, 28, 13, 26, 25, 31, 3, 28, 28, 31, 19, …
$ yy           <int> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 201…
$ ABDU_region  <chr> "interior", "interior", "interior", "interior", "interior…
$ county.lat   <dbl> 43.46972, 44.36015, 44.36015, 43.80162, 44.63980, 42.9500…
$ county.long  <dbl> -83.41717, -83.64132, -83.64132, -89.04051, -86.01980, -8…
$ VPDB         <dbl> NA, NA, NA, NA, NA, NA, -25, NA, NA, NA, NA, NA, NA, NA, …
$ AIR          <dbl> NA, NA, NA, NA, NA, NA, 8.5, NA, NA, NA, NA, NA, NA, NA, …
$ date         <int> 312, 284, 306, 299, 283, 301, 317, 360, 329, 365, 337, 33…
$ hunting.date <int> 68, 40, 62, 55, 39, 57, 73, 116, 85, 121, 93, 88, 57, 121…

To make this more manageable, we will just select juvenile birds harvested in Ontario.

duck.d2h <- filter(duck.d2h, age == "I") %>% # Select the juveniles
  filter(state.prov == "ON") %>% # select the birds harvested in Ontario
  select(id.lab, VSMOW, long.dec, lat.dec)

nrow(duck.d2h) # sample size
[1] 41

So that leaves us with 41 juvenile black ducks harvested in Ontario across 2 hunting seasons (2017-2018, 2018-2019).


3 - Shapefiles

Before we start working with isoscapes, we should ensure that all the necessary polygons for the assignment are loaded. These polygons are: North America and a breeding range. For the North America polygon, we will use the rnaturalearth package to load a map of the countries/states within North America.

For the breeding range, I have provided a shapefile hosted on github (https://github.com/JacksonKusack/Isotope-Assignment-Workshop). Other easily accessible options exist for these breeding ranges for birds, such as BirdLife International or eBird which both require a simple request for access but provide the data quickly.

To be safe - we will reproject them so that they are all in the correct coordinate system (CRS). If you are unfamiliar with coordinate systems and projections in R, I won’t go through them here. Here is a quick overview. Most of the isoscape data that we are working with uses the WGS84 coordinate system (EPSG 4326). This system uses latitude and longitude coordinates on the WGS84 reference ellipsoid. This is the coordinate system commonly used by organizations that provide GIS data for the entire globe or many countries.

northamerica <- ne_countries(continent = "North America", scale = 50, returnclass = "sf") %>% # Countries
  st_transform(st_crs('EPSG:4326')) # Project to WGS84

northamerica.states <- ne_states(country =  c("Canada","United States of America"), returnclass = "sf") %>% # States and provinces
  st_transform(st_crs('EPSG:4326')) # Project to WGS84

breeding.range <- st_read("/vsicurl/https://github.com/JacksonKusack/Assignment-Workshop/raw/main/Shapefiles/ABDU_baldassarre_lowdens.shp") %>% # Load a shapefile from GitHub
  st_transform(st_crs('EPSG:4326')) # Project to WGS84
Reading layer `ABDU_baldassarre_lowdens' from data source 
  `/vsicurl/https://github.com/JacksonKusack/Assignment-Workshop/raw/main/Shapefiles/ABDU_baldassarre_lowdens.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 1 feature and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -96.86612 ymin: 34.58858 xmax: -52.61946 ymax: 60.57192
Geodetic CRS:  NAD83

Here is what that range looks like:

plot(st_geometry(breeding.range), col = brewer.pal(9, "YlGnBu")[5], border = NA)
plot(st_geometry(northamerica), add = T)

If you are unfamiliar with spatial data in R, the three data types that we are working with in this document are (1) polygons (i.e., sequence of two-dimensional points forming a closed non-self-intersecting shape), (2) points (i.e., series of zero-dimensional points) and (3) rasters (i.e., grid of cells storing values, all of which are the same size).

For this tutorial, we will use the terra and sf packages for most functions and data manipulation, but may have to switch back to sp or raster R objects, because the functions necessitate the older versions. Moving forward, terra and sf will be the supported standards, but until all packages adopt these formats, we have to work around them. So, we will load/index/transform the data in sf format but convert to sp temporarily using the as(x, “Spatial”) function.

Now we have the shapefiles loaded, let’s see where these black ducks were harvested.

duck.points <- vect(duck.d2h, geom = c("long.dec", "lat.dec")) # create vect (points) object

plot(duck.points, col = 'red', pch = 17)
plot(st_geometry(northamerica.states), add = T)

Mostly, around southern Ontario, but some in central and western Ontario. Note some fall outside of Ontario, because these points are based on hunter estimated locations (i.e., _____ km east of _____ town), but this won’t affect these methods becuase we won’t use the harvest location here.


4 - Isoscape

Next we will download an amount-weighted growing season \(\delta\)2H precipitation isoscape (see Bowen et al. 2005 for methods), from the AssignR package. This function will save a version of this surfaces locally on your harddrive each time it’s run. This surface provides four layers: a mean and sd surface for \(\delta\)2H values and \(\delta\)18O values. For this workshop, we’ll just use the \(\delta\)2H values surfaces.

gsd <- getIsoscapes(isoType = "GlobalPrecipGS", timeout = 1200) %>% 
  projectRaster(crs = CRS(SRS_string = 'EPSG:4326'))
Refer to C:\Users\jacks\AppData\Local\Temp\RtmpKu1hzi\metadata.txt for 
  documentation and citation information
gsd <- gsd[[1:2]] # Pull out the d2H surfaces (1 and 2)

Visualizing this isoscape and error surface, we can see the we have data on hydrogen isotopes within precipitation for much of the globe! Note the error is much higher near the poles, which makes the error in other areas difficult to interpret. When we mask these later, it will be easier to see.

plot(gsd, xlab="Longitude", ylab="Latitude", col = custom.palette(16))


4.1 - Calibration

Now we have the isoscape, we need to calibrate the mean \(\delta\)2Hp values to tissue relevant values. This can be done because of predictable relationships between stable-hydrogen values within tissues (\(\delta\)2Htissues) and stable-hydrogen values within precipitation (\(\delta\)2Hp) at the location of feather growth (see Hobson et al. 2012).

At this point we could limit the isoscape to our breeding range, but we may have calibration data (i.e., known-origin tissues) from another species that could be found outside this range. Therefore, I find it easier to do this step on the global isoscape surface. This takes a bit longer, but it’s only a minor difference.

AssignR provides some raw known-origin data (i.e., location of known-origin tissues and their \(\delta\)2H values) stored the knownOrig dataset. At the time this was written, there were data from 18 publications stored in the database and 4062 unique known-origin samples. Each individual has additional metadata, but the available data isn’t complete in some cases. When looking at this data, be careful to ensure that the data are appropriate for your study and, as an extra step, match the original publication.

data("knownOrig")
knownOrig$sources[1:2] # list the dataset names and ids
   Dataset_ID                             Dataset_name
1           1             Hobson et al. 1999 Oecologia
2           2                  Hobson et al. 2012 Plos
3           3      Hobson and Wassenaar 1997 Oecologia
4           4             Clark et al. 2006 Can J Zool
5           5             Hobson et al. 2004 Oecologia
6           6                  Lott and Smith 2006 Auk
7           7         Hobson and Kohler 2015 Ecol Evol
8           8 Thompson et al. 2010 Am J Phys Anthropol
9           9    Bowen et al. 2009 Am J Phys Anthropol
10         10              Ehleringer et al. 2008 PNAS
11         11                            Wunder Plover
12         12        van Dijk et al. 2014 J Avian Biol
13         13            Neto et al. 2006 J Avian Biol
14         16                           Magozzi Towhee
15         17       Prochazka et al. 2013 J Avian Biol
16         18             Langin et al. 2007 Oecologia
17         19             Bataille Canadian Human Hair
18         20                   Hobson et al. 2019 FEE

Lets see what species are available:

unique(knownOrig$samples$Taxon)
 [1] "Danaus plexippus"            "Setophaga ruticilla"        
 [3] "Turdus migratorius"          "Setophaga coronata auduboni"
 [5] "Poecile atricapillus"        "Thryomanes bewickii"        
 [7] "Thryothorus ludovicianus"    "Spizella passerina"         
 [9] "Geothlypis trichas"          "Setophaga pensylvanica"     
[11] "Baeolophus bicolor"          "Vermivora chrysoptera"      
[13] "Catharus guttatus"           "Setophaga citrina"          
[15] "Geothlypis formosa"          "Geothlypis tolmiei"         
[17] "Oreothlypis ruficapilla"     "Cardinalis cardinalis"      
[19] "Oreothlypis celata"          "Junco hyemalis oregonus"    
[21] "Seiurus aurocapilla"         "Vireo olivaceus"            
[23] "Melospiza melodia"           "Catharus ustulatus"         
[25] "Catharus fuscescens"         "Vireo griseus"              
[27] "Cardellina pusilla"          "Hylocichla mustelina"       
[29] "Icteria virens"              "Setophaga petechia"         
[31] "Melozone aberti"             "Vermivora cyanoptera"       
[33] "Passer domesticus"           "Aimophila ruficeps"         
[35] "Poecile carolinensis"        "Troglodytes aedon"          
[37] "Dumetella carolinensis"      "Mniotilta varia"            
[39] "Lanius ludovicianus"         "Anthus spragueii"           
[41] "Euphagus carolinus"          "Empidonax minimus"          
[43] "Oreothlypis peregrina"       "Aythya affinis"             
[45] "Cyanistes caeruleus"         "Phasianus colchicus"        
[47] "Lagopus lagopus"             "Tetrao tetrix"              
[49] "Dryocopus maritus"           "Serin serin"                
[51] "Vanellus vanellus"           "Corvus corone"              
[53] "Turdus merula"               "Corvus monedula"            
[55] "Columba palumbus"            "Turtle Dove"                
[57] "Tetrastes bonasia"           "Perdix perdix"              
[59] "Anas platyrhyncos"           "Branta canadensis"          
[61] "Columba livia"               "Numenius arguata"           
[63] "Turdus pilaris"              "Turdus iliacus"             
[65] "Turdus philomelos"           "Fringilla coelebs"          
[67] "Buteo lagopus"               "Accipiter striatus"         
[69] "Falco sparverius"            "Accipiter gentillis"        
[71] "Accipiter cooperii"          "Buteo jamaicensis"          
[73] "Buteo platypterus"           "Buteo swainsoni"            
[75] "Circus cyaneus"              "Falco columbarius"          
[77] "Falco mexicanus"             "Falco perigrinus"           
[79] "Wilsonia citrina"            "Oporornis tolmiei"          
[81] "Wilsonia pusilla"            "Homo sapiens"               
[83] "Charadrius montanus"         "Anas platyrhynchos"         
[85] "Locustella luscinioides"     "Acrocephalus arundinaceus"  
[87] "Acrocephalus scirpaceus"     "Pipilo maculatus"           

Looks like there are some data for Mallard, which we can use to calibrate the isoscape for our black duck data. One option would be to use a single study that looked at Mallard, such as van Dijk et al. 2014. For that we can use the function subOrigData(…). We can specify the dataset, by number, and which marker we want. For van Dijk et al. 2014, the dataset is #12.

cal.mall <- subOrigData(marker = 'd2H', dataset = 12, ref_scale = NULL) # Extract data
215 samples are found from 39 sites

cal.mall$data <- spTransform(cal.mall$data, CRS(SRS_string = 'EPSG:4326')) # reproject the data

These datasets provide information on the source of the data (i.e., lab methods and publication information) and the data itself (e.g., tissue isotopes, age, taxon, site), as well as other metadata. For this data to work with the below functions, we need to keep it in this format, but we can always modify the data portion directly (as we did with the projection above).

str(cal.mall)
List of 4
 $ data   :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
  .. ..@ data       :'data.frame':  215 obs. of  19 variables:
  .. .. ..$ Site_ID        : num [1:215] 1543 1543 1543 1543 1543 ...
  .. .. ..$ Site_name      : chr [1:215] NA NA NA NA ...
  .. .. ..$ State          : chr [1:215] NA NA NA NA ...
  .. .. ..$ Country        : chr [1:215] NA NA NA NA ...
  .. .. ..$ Site_comments  : num [1:215] NA NA NA NA NA NA NA NA NA NA ...
  .. .. ..$ Sample_ID      : num [1:215] 2722 2723 2724 2725 2726 ...
  .. .. ..$ Sample_ID_orig : chr [1:215] NA NA NA NA ...
  .. .. ..$ Dataset_ID     : num [1:215] 12 12 12 12 12 12 12 12 12 12 ...
  .. .. ..$ Taxon          : chr [1:215] "Anas platyrhynchos" "Anas platyrhynchos" "Anas platyrhynchos" "Anas platyrhynchos" ...
  .. .. ..$ Group          : chr [1:215] "Water bird" "Water bird" "Water bird" "Water bird" ...
  .. .. ..$ Source_quality : chr [1:215] "known source" "known source" "known source" "known source" ...
  .. .. ..$ Age_class      : chr [1:215] NA NA NA NA ...
  .. .. ..$ Material_type  : chr [1:215] "Feather" "Feather" "Feather" "Feather" ...
  .. .. ..$ Matrix         : chr [1:215] "Keratin" "Keratin" "Keratin" "Keratin" ...
  .. .. ..$ d2H            : num [1:215] -134 -135 -142 -132 -132 ...
  .. .. ..$ d2H.sd         : num [1:215] 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 ...
  .. .. ..$ d18O           : num [1:215] NA NA NA NA NA NA NA NA NA NA ...
  .. .. ..$ d18O.sd        : num [1:215] NA NA NA NA NA NA NA NA NA NA ...
  .. .. ..$ Sample_comments: num [1:215] NA NA NA NA NA NA NA NA NA NA ...
  .. ..@ coords.nrs : num(0) 
  .. ..@ coords     : num [1:215, 1:2] 27.4 27.4 27.4 27.4 27.4 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:215] "2722" "2723" "2724" "2725" ...
  .. .. .. ..$ : chr [1:2] "Longitude" "Latitude"
  .. ..@ bbox       : num [1:2, 1:2] -8.68 40.69 39.32 63.15
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:2] "Longitude" "Latitude"
  .. .. .. ..$ : chr [1:2] "min" "max"
  .. ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
  .. .. .. ..@ projargs: chr "+proj=longlat +datum=WGS84 +no_defs"
  .. .. .. ..$ comment: chr "GEOGCRS[\"WGS 84 (with axis order normalized for visualization)\",\n    ENSEMBLE[\"World Geodetic System 1984 e"| __truncated__
 $ sources:'data.frame':    1 obs. of  17 variables:
  ..$ Dataset_ID             : num 12
  ..$ Dataset_name           : chr "van Dijk et al. 2014 J Avian Biol"
  ..$ Citation               : chr "van Dijk JGB, Meissner W, Klaassen M. 2014. Improving provenance studies in migratory birds when using feather "| __truncated__
  ..$ Sampling_method        : chr NA
  ..$ Sample_powdered        : chr NA
  ..$ Lipid_extraction       : chr "Y"
  ..$ Lipid_extraction_method: chr "2:1 chloroform:methanol"
  ..$ Exchange               : chr "Y"
  ..$ Exchange_method        : chr "Comparative eqib"
  ..$ Exchange_T             : chr "Ambient"
  ..$ H_cal                  : chr "OldEC.1_H_1"
  ..$ O_cal                  : chr NA
  ..$ Std_powdered           : chr "Y"
  ..$ Drying                 : chr "N"
  ..$ Analysis_method        : chr "pyrolysis; CF-IRMS"
  ..$ Analysis_type          : chr "H"
  ..$ Source_comments        : num NA
 $ chains : NULL
 $ marker : chr "d2H"
 - attr(*, "class")= chr "subOrigData"

AssignR also provides methods to transform the \(\delta\)2H values into different reference scales. For details on these methods sees Magozzi et al. 2021. Simply, this allows us to combine data from different labs, if they use different keratin standards on different scales. If the ref_scale is changed though, you need to ensure that the sample data are also converted to this scale. If everything was done in the same lab or the two labs used the same methods, then you can set this to NULL.

Example:

test.transformed <- subOrigData(marker = 'd2H', dataset = 12, ref_scale = "VSMOW_H")
215 samples are found from 39 sites
215 samples from 39 sites in the transformed dataset

mean(cal.mall$data$d2H) # Untransformed mean
[1] -104.7414
mean(test.transformed$data$d2H) # Transformed mean
[1] -78.02873

Luckily for us, the isotope data from van Dijk et al. 2014 were done using keratin standards on the same scale as the sample black duck data (e.g., OldEC.1_H_1). This is the data that we will use to run the calibration.

We can calibrate the gsd isoscape and produce a new isoscape where values are predicted feather values (\(\delta\)2Hf). To do this, we can use the calRaster(…) function, providing it with our known origin data (cal.mall) and precipitation isoscape (d2h_world).

r <- calRaster(known = cal.mall, isoscape = gsd, interpMethod = 1, verboseLM = F, genplot = F) # Calibrate the mean and sd values from our isoscape
r.model <- r$lm.model # Extract model results
summary(r.model) 

Call:
lm(formula = tissue.iso ~ isoscape.iso[, 1], weights = tissue.iso.wt)

Residuals:
    Min      1Q  Median      3Q     Max 
-34.267  -9.975  -0.905   9.084  39.898 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       -37.91827    3.42894  -11.06   <2e-16 ***
isoscape.iso[, 1]   1.17645    0.05909   19.91   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.11 on 213 degrees of freedom
Multiple R-squared:  0.6504,    Adjusted R-squared:  0.6488 
F-statistic: 396.3 on 1 and 213 DF,  p-value: < 2.2e-16
p <- ggplot(data = r$lm.data, aes(y = tissue.iso, x = isoscape.iso)) + 
  geom_point()  + 
  stat_smooth(method = "lm", formula = 'y ~ x') + 
  theme_classic()
  
ggplotly(p)

This calibration relationship produces the following linear equation (i.e., calibration equation):

\[\delta^2H_f = 1.18 * \delta^2H_p - 37.92\]

First, you might notice that the produced calibration equation doesn’t exactly match the one in in van Dijk et al. 2014:

\[\delta^2H_f = 1.36 * \delta^2H_p - 21.9\]

This is because the amount-weighted growing season \(\delta\)2H precipitation isoscape has been updated with additional years of precipitation data since publication. And AssignR doesn’t contain all of the data in this case. But the first point would still apply if all the data were available.

The mean surface has been updated using this calibration relationship, but the error surface has also been modified using the calibration equation. For the specific methods see see Ma et al. 2020), but the general idea here is the resulting value is incoporates the standard devation of ()2H values at each cell, the standard devation of residuals within the calibration equation, and the covariance between the two measures of variation.

Now we have a new isoscape object r which again is a rasterstack containing the calibrated mean and error surfaces.

plot(r$isoscape.rescale, xlab="Longitude", ylab="Latitude", col = custom.palette(16))

To limit this isoscape to the breeding range of the species of interest we then apply a mask function. We could use the mask argument within the calRaster(…) function, but the ‘mask’ argument does not use the actual polygon to limit the isoscape, it uses the extent/bounding box. So, we can use the functions within the terra package to manually apply the mask.

r$isoscape.rescale <- mask(rast(r$isoscape.rescale), breeding.range) %>% # Mask to breeding range
  crop(extent(breeding.range)) %>% # Crop to breeding range
  raster::stack() # Convert back to raster object

plot(r$isoscape.rescale, xlab="Longitude", ylab="Latitude", col = custom.palette(16))


5 - Assignment to Origin

Now we can attempt an assignment to origin using the function pdRaster(…). This function will produce a posterior probability of origin surface, taking into account the mean ()2Hf values at each cell, as well as our estimate of error. The probability is estimated using a Normal Probability Density function:

\[f(y|\mu_c,\sigma_c) = \frac{1}{\sigma_c \sqrt{2\pi}} e^{-\frac{1}{2}(\frac{y-\mu_c}{\sigma_c})^2 }\] Where \(f(y|\mu_c,\sigma_c)\) represents the probability that a given cell (c) within the \(\delta\)2Hf isoscape represents a potential origin for an individual of unknown origin (y), given the expected mean \(\delta\)2Hf for that cell (\(\mu_c\)) from the calibrated \(\delta\)2Hf isoscape, and the expected standard deviation (\(\sigma_c\)) of \(\delta\)2Hf between individuals growing their feathers at the same locality.

origins <- pdRaster(r, data.frame(duck.d2h)[1:2], mask = as(breeding.range, 'Spatial'), genplot = F)

Each resulting probability surfaces is normalized to sum to 1, which we can check:

cellStats(origins[[10]], 'sum') # Posterior probabilities across a single raster layer should sum to 1
[1] 1

After applying the above code, we now have a Rasterstack (i.e., series of raster layers with identical extent and resolution; origins) where each layer is’ a probability of origin surface for each respective individual within our duck.d2h object (n = 41). For example, this is what the surface looks like for black duck 3.

plot(origins[[3]], col = custom.palette(8))
plot(st_geometry(breeding.range), add = T)
plot(st_geometry(northamerica), add = T)

But what if have have prior knowledge that can inform


5.1 - Prior Probabilities of Origin

Priors allow us to incorporate known information about the system into these assignments. For example, we can include genetic information into the assignment where the genetic similarity between an assigned individual and different breeding regions are used to refine the region of origin. These probabilities are incorporated in the likelihood-based assignment methods through Bayes Rule:

\[f_x = \frac{f(y|\mu_c,\sigma_c)f_{prior}}{\sum_{i}{f(y|\mu_c,\sigma_c)f_{prior}}}\]

In practical terms, these priors are spatially explicit probability surfaces (in raster format) that can be incorporated into the pdRaster(…) function through the argument “prior = surface”. To see how this effects the probability surfaces, we’ll create a theoretical prior. We’ll use the province/state surface to represent different breeding regions and we’ll assign a probability to those regions.

To get these prior surfaces, we can start with polygonal data where each shape has the probability as an attribute/variable. Because we are making this up, we have to mannually assign these values to our province/state shapefile, which is easy enough. The values that I’ve chosen ar

prior <- northamerica.states
prior$prob[prior$name == "Ontario"] <- 0.1
prior$prob[prior$name == "Québec"] <- 0.4
prior$prob[prior$name %in% c("New Brunswick","Nova Scotia","Prince Edward Island")] <- 0.2
prior$prob[prior$name %in% c("Newfoundland and Labrador")] <- 0.3
prior$prob[prior$admin == "United States of America"] <- 0

Also, note that this prior is the same for all individuals, in this case. We could develop a RasterStack with a unique prior for each individual. This would probably be the case if we were doing genetics and had genetic information for all sampled indiviudals, but a standard prior could be necessary if we only had information for a given region or cohort.

Once we have our probabilities assigned to the poylgons, we need to rasterize them using the function rasterize(…) which converts the polygons into raster data with the same resolution and extent as the input raster. We still need to mask the resulting raster though.

prior <- rasterize(prior, r$isoscape.rescale[[1]], field = "prob") %>% # rasterize the polygons to match the calibrated isoscape
  mask(breeding.range) # mask

plot(prior, col = custom.palette(16))

Now we just need to input the prior into pdRaster.

origins.prior <- pdRaster(r, data.frame(duck.d2h)[1:2], mask = as(breeding.range, 'Spatial'), prior = prior, genplot = F) # assignment with the prior

If we compare the two outputs, selecting the 12th duck this time, we can see the differences. Notice that while the probability is lower in Ontario, it’s not 0. We gave a lower relative probability compared to Quebec and easern Canada, but the resulting probabilies are not 0.

plot(origins[[12]], col = custom.palette(8))
plot(st_geometry(breeding.range), add = T)
plot(st_geometry(northamerica), add = T)

plot(origins.prior[[12]], col = custom.palette(8))
plot(st_geometry(breeding.range), add = T)
plot(st_geometry(northamerica), add = T)

If we are only interested in one individual, this surface can give a good idea where they likely originated, but we are rarely ever interested in one individual. The power of this method relies on examining likely origin across a large group of individuals.

To combine these surfaces to examine likely origins for all of the individuals simultaneously. This can be done by summing all individuals as binary surfaces, produced using an odds ratios.


5.2 - Binary surfaces

Lastly, we can convert these surfaces into binary surfaces using a specified odds ratio (or probability threshold). Here we use the function qtlRaster(…) to determine the upper 66% of probable cells, in terms of probability density.

An odds ratio describes the relative strength of an individual originating within a given area compared to them originating outside of that area, given the posterior probability distribution. A commonly used odds ratio is 2:1. When applied, the resulting region of origin contains cells that represent the upper 66% of probability density while the cells outside this region represent the lower 33% of the probability density.

Using this approach, we select the upper 66.66% (i.e., 2:1 odds) of ‘estimated probabilities of origin’ for each individual and code these as 1 (i.e., likely origin) and all others as 0 (i.e., unlikely origin). This ratio can be changed depending on how conservative you want to be. The larger the percentage (e.g., 3:1 or 75%; 4:1 or 80%; 9:1 or 90%) the larger (i.e., more cells), and less precise, each region of likely origin will be, but the more likely that the region contains the true origin of that individual. Previous studies have found that 2:1 or 3:1 represent a good compromise between precision and accuracy.

First, we manually set the odds ratio by assigning a cumulative probability bound to the odds object. This value can be changed to represent any desired odds ratio. For 2:1 odds enter 0.67, for 3:1 odds enter 0.75, for 4:1 odds enter 0.80, for 9:1 odds enter 0.90, for 19:1 enter 0.95.

odds <- 2/3 # select upper 66% of cells

binary.origins <- qtlRaster(origins, threshold = odds, thresholdType = "prob", genplot = F)

Comparing this binary surface to the probability surface for the same individual, we can see that the region of highest probability density is now populated by 1’s and all other cells are 0’s.

par(mfrow = c(1,2))
plot(origins[[3]], col = custom.palette(8)) 
plot(st_geometry(northamerica), add = T)

plot(binary.origins[[3]], col = custom.palette(8))
plot(st_geometry(northamerica), add = T)

Now we can sum all of these surfaces to create our final raster surface where the value at each a given represents the number of individuals probabalistically assigned to that cell, given the odds ratio of 2:1.

origins <- calc(binary.origins, sum)

6 - Outputs

At this point, you have your completed assignment, but it is important to save 2 things: (1) the raster file and (2) figure.

6.1 - Raster file

For the raster file, we can use the writeRaster(…) function, saving the surface as an .ascii file. Now this file can be easily loaded into R, or any other GIS software. In theory, you could even load this file to modify any exported plots without rerunning the above methods.

writeRaster(origins, "ABDU_origins", format = "ascii", overwrite = TRUE) 

6.2 - Figure

Lastly, we can visualize and export a plot of this surface. Rather than using the basic plot(…) function, we can use the function levelplot(…) available from the RasterVis library. This package provides visualization methods for quantitative and categorical raster data.

Similar to ggplot2, you can add additional graphical layers using the layer(sp.polygons(…)) functions (or replace sp.polygons with sp.points(…) and sp.lines(…)).

(p1 <- levelplot(origins, col.regions = custom.palette(16), margin=FALSE, 
          legend=list(top=list(fun=grid::textGrob("Count", y=0.3, x=1.09)))) +
    latticeExtra::layer(sp.polygons(as(northamerica, "Spatial"), size = 0.5)) +
    latticeExtra::layer(sp.polygons(as(breeding.range, "Spatial"), fill = NA, alpha = 1, col = "black", lwd = 1.5, lty = 3)))

Then we can export this plot using the png(…) function.

png(filename = "ABDU_origins.png", units = "in", width = 6, height = 5, res=1200)  
p1
invisible(dev.off())

7 - Miscellaneous Tips

Below are a few extra pointers that can hopefully answer some questions that we don’t have time to fully explore. None of the below code runs or makes logical sense, but could be inserted above if the object names are changed.

7.1 What if you just want to use the calibration equation?

If you don’t have raw calibration data, we could use AssignR with a lot of tweaking, but isocat is much easier. Because there are no explicitt calibration functions in isocat, using the linear equation to convert the isoscape before inputting the surface into the assignment function (ccc) is the workflow anyways.

library(isocat)

cal.sd # SD of residuals from the calibration equation
cal.slope # slope for tha calibration equation
cal.intercept # intercept for tha calibration equation

cal.iso <- isoscape * cal.slope + cal.intercept

assignmentModels <- isotopeAssignmentModel(ID = samples$ID, 
                                           isotopeValue = samples$isotopeValue, 
                                           SD_indv = cal.sd, 
                                           precip_raster = cal.iso, 
                                           precip_SD_raster = cal.iso.sd)

7.2 What if you don’t have an error surface?

Here we have to input a placeholder error surface in for the precip_SD_raster. Personally, I use isocat, because the process is more straight-forward. For isocat, we input an error surface populated with zero’s (i.e., no error). Obviously this isn’t reality, but if we don’t have the error surface, this is a work around.

In isocat’s workflow, you specify the calibration and isoscape error separately when performing the assignment, so you can skip the calRaster(…) step and just apply the calibration equation manually.

assignmentModels <- isotopeAssignmentModel(ID = samples$ID, 
                                           isotopeValue = samples$isotopeValue, 
                                           SD_indv = cal.sd, 
                                           precip_raster = cal.iso, 
                                           precip_SD_raster = (cal.iso * 0)) # Add the error surface, which is just the mean surface with 0's

7.3 Clustering based on surface similarity?

We didn’t go over this functionality here, but isocat provides a great suite of tools to look at how spatial probability surfaces cluster based on spatial similarity. Check the isocat vignettes and paper here. While AssignR can be more user friendly, as many of the computation stuff happens behind the scenes, the functionality can be more rigid due to the specific object classes and necessary inputs (i.e., calibration data, error surface). Isocat is a great option to explore if you want to do some of the data manipulation yourself, or if you want to use the clustering methods.

7.4 Multiple isotopes?

Again, we didn’t go over this functionality, but we can easily apply multiple isotopes/isoscapes in the same workflow. The second isoscape wouldn’t be hydrogen, because that’s likely our primary isoscape, so we don’t need to follow the same calibration procedures. How we calibrate this isoscape and whether we need to calibrate it at all depends on the surface. This is beyond the scope of this workshop, so you need to determine this on an isotope-by-isotopes basis.

One issue here goes back to tip #1, because we often do not have an uncertainty surface for a second isoscape… But we can use the same logic as above.

The only thing to be careful of here is that the order of the sample isotopes in the sample dataframe matches the order of the isoscapes in the RasterStack. Otherwise, it’s the same pdRaster(…) function.

# We'll assume the first isotope is the d2H data from above

isoscape.2 <- rast("Fakedata.tif") %>% 
  projectRaster(crs = CRS(SRS_string = 'EPSG:4326')) %>%
  mask(breeding.range) %>% # Mask to breeding range
  crop(extent(breeding.range)) %>% # Crop to breeding range
  raster::raster() # Convert back to raster object

isoscape.2 <- stack(isoscape.2, isoscape.2 * 0) # Add zero-error surface

isoscape.stack <- stack(isoscape.1, isoscape.2) # Stack two isotopes (and error surfaces) together
names(isoscape.stack) <- c('iso1','iso1.sd','iso2','iso2.sd')

p <- pdRaster(isoscape.stack, sample.df)

8 - References

Baldassarre GA (2014) Ducks, geese, and swans of North America. JHU Press.

Bowen GJ, Wassenaar LI, Hobson KA (2005) Global application of stable hydrogen and oxygen isotopes to wildlife forensics. Oecologia 143:337–348. Link

Campbell CJ, Fitzpatrick MC, Vander Zanden HB, Nelson DM (2020) Advancing interpretation of stable isotope assignment maps: comparing and summarizing origins of known-provenance migratory bats. Anim Migr 7:27-41. Link

Hobson KA, Van Wilgenburg SL, Wassenaar LI, Larson K (2012) Linking hydrogen (δ2H) isotopes in feathers and precipitation: sources of variance and consequences for assignment to isoscapes. PLOS ONE 7:e35137. Link

Hobson KA, Wunder MB, Van Wilgenburg SL, et al. (2009) A method for investigating population declines of migratory birds using stable isotopes: origins of harvested Lesser Scaup in North America. PLOS ONE 4:e7915. Link

Kusack JW, Tozer DC, Schummer ML, Hobson KA (2023) Origins of harvested American black ducks: stable isotopes support the flyover hypothesis. J Wildl Manag 87:e22324. Link

Ma C, Vander Zanden HB, Wunder MB, Bowen GJ (2020) assignR: an R package for isotope-based geographic assignment. Methods in Ecol Evol 11:996–1001. Link

Magozzi, S, Bataille, CP, Hobson, KA, et al. (2021) Calibration chain transformation improves the comparability of organic hydrogen and oxygen stable isotope data. Methods Ecol Evol 12:732–747. Link

van Dijk JGB, Meissner W, Klaassen M (2014) Improving provenance studies in migratory birds when using feather hydrogen stable isotopes. J Avian Biol 45:103–108. Link

van Wilgenburg SL, Hobson KA (2011) Combining stable-isotope (δD) and band recovery data to improve probabilistic assignment of migratory birds to origin. Ecological Applications 21:1340–1351. Link

Vander Zanden HB, Wunder MB, Hobson KA, et al. (2014) Contrasting assignment of migratory organisms to geographic origins using long-term versus year-specific precipitation isotope maps. Methods Ecol Evol 5:891-900. Link

sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8   
[3] LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C                   
[5] LC_TIME=English_Canada.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rdryad_1.0.0        plotly_4.10.1       ggplot2_3.4.0      
 [4] RColorBrewer_1.1-3  dplyr_1.0.10        rnaturalearth_0.3.2
 [7] rasterVis_0.51.5    lattice_0.20-45     sf_1.0-9           
[10] terra_1.6-53        raster_3.6-14       sp_1.6-0           
[13] assignR_2.2.1      

loaded via a namespace (and not attached):
 [1] nlme_3.1-160             httr_1.4.4               tools_4.2.2             
 [4] bslib_0.4.2              utf8_1.2.2               rgdal_1.6-4             
 [7] R6_2.5.1                 KernSmooth_2.23-20       mgcv_1.8-41             
[10] DBI_1.1.3                lazyeval_0.2.2           colorspace_2.1-0        
[13] withr_2.5.0              tidyselect_1.2.0         rnaturalearthdata_0.1.0 
[16] curl_5.0.0               compiler_4.2.2           cli_3.4.1               
[19] labeling_0.4.2           triebeard_0.4.1          sass_0.4.4              
[22] scales_1.2.1             classInt_0.4-8           hexbin_1.28.2           
[25] proxy_0.4-27             rappdirs_0.3.3           stringr_1.5.0           
[28] mvnfast_0.2.7            digest_0.6.31            foreign_0.8-83          
[31] rmarkdown_2.20           jpeg_0.1-10              pkgconfig_2.0.3         
[34] htmltools_0.5.4          fastmap_1.1.0            highr_0.10              
[37] htmlwidgets_1.6.1        rlang_1.0.6              rstudioapi_0.14         
[40] httpcode_0.3.0           prettydoc_0.4.1          jquerylib_0.1.4         
[43] generics_0.1.3           zoo_1.8-11               jsonlite_1.8.4          
[46] crosstalk_1.2.0          zip_2.2.2                magrittr_2.0.3          
[49] Matrix_1.5-3             interp_1.1-3             geosphere_1.5-18        
[52] Rcpp_1.0.10              munsell_0.5.0            fansi_1.0.4             
[55] lifecycle_1.0.3          stringi_1.7.12           yaml_2.3.6              
[58] grid_4.2.2               maptools_1.1-6           parallel_4.2.2          
[61] deldir_1.0-6             splines_4.2.2            knitr_1.41              
[64] pillar_1.8.1             codetools_0.2-18         crul_1.3                
[67] glue_1.6.2               evaluate_0.20            latticeExtra_0.6-30     
[70] hoardr_0.5.3             data.table_1.14.6        png_0.1-8               
[73] vctrs_0.5.2              urltools_1.7.3           gtable_0.3.1            
[76] purrr_1.0.1              tidyr_1.2.1              assertthat_0.2.1        
[79] cachem_1.0.6             xfun_0.36                mime_0.12               
[82] e1071_1.7-12             rnaturalearthhires_0.2.0 class_7.3-20            
[85] viridisLite_0.4.1        tibble_3.1.8             units_0.8-1             
[88] ellipsis_0.3.2